Sparse similarity-preserving hashing

نویسندگان

  • Jonathan Masci
  • Alexander M. Bronstein
  • Michael M. Bronstein
  • Pablo Sprechmann
  • Guillermo Sapiro
چکیده

In recent years, a lot of attention has been devoted to efficient nearest neighbor search by means of similarity-preserving hashing. One of the plights of existing hashing techniques is the intrinsic trade-off between performance and computational complexity: while longer hash codes allow for lower false positive rates, it is very difficult to increase the embedding dimensionality without incurring in very high false negatives rates or prohibiting computational costs. In this paper, we propose a way to overcome this limitation by enforcing the hash codes to be sparse. Sparse highdimensional codes enjoy from the low false positive rates typical of long hashes, while keeping the false negative rates similar to those of a shorter dense hashing scheme with equal number of degrees of freedom. We use a tailored feed-forward neural network for the hashing function. Extensive experimental evaluation involving visual and multimodal data shows the benefits of the proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ranking Preserving Hashing for Fast Similarity Search

Hashing method becomes popular for large scale similarity search due to its storage and computational efficiency. Many machine learning techniques, ranging from unsupervised to supervised, have been proposed to design compact hashing codes. Most of the existing hashing methods generate binary codes to efficiently find similar data examples to a query. However, the ranking accuracy among the ret...

متن کامل

State of the Art in Similarity Preserving Hashing Functions

One of the goals of digital forensics is to analyse the content of digital devices by reducing its size and complexity. Similarity preserving hashing functions help to accomplish that mission through a resemblance comparison between different files. Some of the best-known functions of this type are the context-triggered piecewise hashing functions, which create a signature formed by several has...

متن کامل

Similarity preserving compressions of high dimensional sparse data

The rise of internet has resulted in an explosion of data consisting of millions of articles, images, songs, and videos. Most of this data is high dimensional and sparse. The need to perform an efficient search for similar objects in such high dimensional big datasets is becoming increasingly common. Even with the rapid growth in computing power, the bruteforce search for such a task is impract...

متن کامل

Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2

Hash functions are a widespread class of functions in computer science and used in several applications, e.g. in computer forensics to identify known files. One basic property of cryptographic hash functions is the avalanche effect that causes a significantly different output if an input is changed slightly. As some applications also need to identify similar files (e.g. spam/virus detection) th...

متن کامل

Locality Preserving Hashing

Hashing has recently attracted considerable attention for large scale similarity search. However, learning compact codes with good performance is still a challenge. In many cases, the real-world data lies on a low-dimensional manifold embedded in high-dimensional ambient space. To capture meaningful neighbors, a compact hashing representation should be able to uncover the intrinsic geometric st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1312.5479  شماره 

صفحات  -

تاریخ انتشار 2013